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ABSTRACT 

Study of the forecasting models using large scale microblog 
discussions and the search behavior data can provide a good 
insight for better understanding the market movements. In 
this work we collected a dataset of 2 million tweets and 
search volume index (SVI from Google) for a period of June 
2010 to September 2011. We perform a study over a set of 
comprehensive causative relationships and developed a uni- 
fied approach to a model for various market securities like 
equity (Dow Jones Industrial Average-DJIA and NASDAQ- 
100), commodity markets (oil and gold) and Euro Forex 
rates. We also investigate the lagged and statistically causative 
relations of Twitter sentiments developed during active trad- 
ing days and market inactive days in combination with the 
search behavior of public before any change in the prices/ 
indices. Our results show extent of lagged significance with 
high correlation value upto 0.82 between search volumes and 
gold price in USD. We find weekly accuracy in direction (up 
and down prediction) uptil 94.3% for DJIA and 90% for 
NASDAQ- 100 with significant reduction in mean average 
percentage error for all the forecasting models. 
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1. INTRODUCTION 

Most of the earlier works in computational finance comprise 
of efficient market hypothesis (EMH) that asserts market 
movements at the present level are a function of already ex- 
isting news, whispers and the future valuation of dividends 
of a stock/ company [9[[l4]. However research by Qian et 
al. shows markets are not fully efficient [16]. B ehavioral fi- 
nance is attracted high interest by financial 20 community. 
It challenges the very existence of efficient markets by plac- 
ing the role of human sentiment and the social mood as vital 
part of investment decisions H^. It challenges the EMH by 
adding the notion of human emotion and the macro-level 
mood play into investment decisions. For example at micro- 



level, consistently rising stocks is an indication of selling to 
hold the profits and perform subsequent portfolio adjust- 
ments. However surprising index trends are observed at 
macro-economic level. Another example of positive stock 
sentiment resulting in negative price movement is observed 
when if lots of people exude high confidence while making 
buy decision for a commodity or stock, causing the price to 
rise so steeply that instead of stabilizing it falls. To fur- 
ther elaborate this point we will discuss the results about 
how bullish twitter sentiments can yield negative correla- 
tions with DJIA index (in other word bearish). 

This era of web technology is marked with high entropy of 
information spread as well as retrieval 3 . Earlier in late 
90s before the spread of social web, information regarding 
commodities/currency rates and buy/sell sentiments took a 
long time (maybe even full day) to disseminate fully in the 
investor community. Also, the companies and markets took 
a long time (weeks or months) to calm market rumors, news 
or false information. This provides an opportunity to re- 
searchers to develop web mining platforms targeted towards 
mining relevant financial insights from the social media and 
web. 

In social web mining context, distinctively there are two 
different approaches that researchers have taken for market 
prediction. Firstly social media feeds (such as Twitter API) 
can provide an important resource to measure investor mood 
at comprehensive scale [s] [7| |18| \l\ 12 |T^. Secondly search 
volumes (eg. using Google search) related to financial mar- 
ket instruments (stocks, bonds, indices, commodities etc.) 
have been shown to give out predictive and causative rela- 
tionships with the market returns 19 . 

BoUen et al. has used dimensions of Google- Profile of Mood 
States to refiect changes in closing price of DJIA An- 
other work by Mao et al. covers effect of search volumes 
data in description with the preliminary sentiment indices 
of entire twitter feed on stock market movements of DJIA 
and volatility index of commodities like gold [TT]. Zhang 
et al. also made have made use of dimensions in human 
behavior- fear and hope to show correlations with the stock 
market indicators [2l] . However these approaches have been 
restricted to investor sentiment with only one perspective of 
macro-economics and are not complete and fiexible in terms 
explaining complete dynamics that can be extended to indi- 
vidual stock index for companies. Sprengers et al. analyzed 
individual stocks for S&P 100 companies and tried correlat- 
ing tweet features about discussions of the stock discussions 
about the particular companies containing the Ticker sym- 




Figure 1: Flowchart of the proposed methodology showing the various phases of sentimental analysis beginning 
with SVI/ Tweet collection to stock future prediction. In the final phase three set of results have been 
presented: (1) Correlation results for twitter sentiments and stock prices for diff*erent companies (2) Granger's 
casuality analysis to prove that the stock prices are affected in the short term by Twitter sentiments (3) Using 
EMMS for quantitative comparison in stock market prediction using tweet features 



bol [18]. This paper is an incremental step towards bring- 
ing out a flexible and novel approach combining the search 
behavior along with the sentiment analysis that is scalable 
(modified easily) for both individual commodities stocks/ 
companies. The approach can be further exploited to make 
successful hedging strategies making wisdom of the crowd 
usable even by a singular investor. 

In this paper, we present a comprehensive study of relation- 
ships over wide range of market securities- commodities such 
as oil, gold, forex rates of Euro and equity markets such as 
DJIA and NASDAQ- 100 with the dynamic features of the 
investor behavior as refiected in the opinions emerging on 
Twitter and trends in the search engine volumes. The sum- 
mary of the whole study conducted in this paper is provided 
in the figure [l] In section |2] we present data collection and 
prior processing that explains the terminologies used in the 
market securities and social mood series. Further in section 
[3] we present the statistical techniques implemented and dis- 
cuss the results and draw conclusions. Future prospects of 
the work are given in section [5] 

2. DATA COLLECTION AND PROCESSING 

In this section, we discuss the collection of various financial 
data series used in this paper. 

2.1 Tweets Extraction and Processing 

Tweets are made accessible through a simple search of key- 
words (various market securities in our case) through an ap- 
plication programming interface (APlQ In this work, we 
have used tweets from period of 15 months and 10 days be- 
tween June 2nd to 13th September 2011. During this period, 

^Twitter API is easily accessible at- 
https://dev.twitter.com/docs. Also Gnip 

jhttp://gnip.com/twitter, the premium platform avail- 
able tor purchasmg historic and present public firehose of 
tweets has many investors as financial customers researching 
in the area, though due to confidentiality issues they are 
not explicitly named 



by querying the Twitter search API for each of the market 
feature under study say Gold, Euro, Dow etc. we collected 
1, 964, 044 (by around 0.71M users) English language tweets. 
Each tweet record contains (a) tweet identifier, (b) date/time 
of submission(in GMT), (c) language and (d)text. Subse- 
quently the stop words and punctuation are removed and 
the tweets are grouped for each day (which is the highest 
time precision window in this study, since we do not group 
tweets further based on hours/minutes). 

2.7.7 Tweet Sentiment Extraction 

In order to compute sentiment for any tweet we classify 
each incoming tweet everyday into positive or negative using 
naive classifier. For each day total number of positive tweets 
are aggregated as Positivcday and total number of negative 
tweets as Negativcday We have made use of lexicon/ JSON 
API from Twittersentiment a service provided by Stan- 
ford NLP research group [s'. Their training was done over 
a dataset of 1,600,000 tweets and the classifier achieved an 
accuracy of about 82.7%. Online classifier has made use of 
Naive Bayesian classification method, which is one of the 
successful and highly researched algorithms for natural lan- 
guage processing classification. It is known to give supe- 
rior performance to other methods in context of tweets [s]. 
Because of the limitations of state of the art natural lan- 
guage processing algorithms, the accuracy of the classifier 
decreases tremendously when number of moods states (or 
number of classes) is taken higher. This decrease in senti- 
ment accuracy affects the prediction accuracy as different in 
the rate of change of sentiment can be measured with signif- 
icant precision. Naive Bayesian classification methods have 
high replicability and few arbitrary fine tuning elements. 
Our data has shown that the remaining residual 17.5% of 
the tweets (misclassifications) are equally distributed over 
the two classes, which doesn't affect the overall prediction 
accuracy as all we are interested in is rate of change of sen- 

^ htt ps : / /sit es . google . com /site/ twitt ersent imenthelp / 



timent index over a period of time. 

In our dataset roughly 67.14% of the tweets are positive, 
while 32.86% of the tweets are negative for the market secu- 
rities under study. This result indicates stock/ commodity 
discussions to be much more balanced in terms of agreement 
than chat and internet board messages where the ratio of 
positive to negative from earlier works ranges from 7:1 Q 
to 5:1 [5^. Balanced distribution of stock discussion pro- 
vides us with more confidence to study information content 
of the positive and negative dimensions of discussion about 
the stock prices on microblogs. 
2.7.2 Feature Extraction and Aggregation 
Further positive and negative tweets from each day are ag- 
gregated to make weekly time domain indicators which is 
the time period under study. We selected weekly domain 
over daily, bi-daily, bi- weekly or monthly as it is the most 
balanced time resolution to study the effect of investor be- 
havior over model performance accuracy; keeping in-market 
monetization potential practically impeccable. 

For every week, the value of the security (closing, volatility, 
volume, weekly returns for each index) is recorded every 
Friday at closing time of the market trading hours 21 : 00 
UTC. To explore the relationships between weekly trading 
and also on the days when market remains closed (weekends, 
national holidays), we broadly focus on two domains of tweet 
sentiments- weekday indices and weekend indices (further 
referred as WD and WK respectively). We have carried 
forward the work of Antweiler et al. for defining bullishness 
{Bt) for each time domain (time window is WD or WK) as 
given by equation [l] 
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Negative represents the number of 
positive or negative tweets during a particular time period 
WD or WK. Logarithm of bullishness measures the share of 
surplus positive signals and also gives more weight to larger 
number of messages in the specific sentiment group (positive 
or negative). Message volume is simply defined as natural 
logarithm of total number of tweets per time domain for a 
specific security/index. And the agreement among positive 
and negative tweet messages is defined as: 
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^j^Positive j^r 

If all the tweet messages about a particular company are 
positive (bullish about a company stock) or negative (bear- 
ish about a company stock), agreement would be 1 in that 
case. Influence of silent tweets days in our study (trading 
days when no tweeting happens about particular company) 
is less than 0.1% which is significantly less than previous 
works [5] [is]. Every market index/ security thus have a 
total of 10 potentially causative time series from Twitter: 
positive WD, negative WD, bullishness WD, message vol- 
ume WD, agreement WD and from previous weekend we 
have positive WK, negative WK, bullishness WK, message 
volume WK and agreement WK. 

2.2 Search Volume Index 

To generate search engine lexicon for each of the five se- 
curities under study- Oil, DJIA, NASDAQ-100, Gold and 



Euro; we start by collecting weekly search volumes for spe- 
cific search terms related to respective sectors like- oil, GLD, 
Dow-30, nasdaq, oil price etc. as given in Table ^ from 
Google Insights of Search]^ Google provides this open 
service to access the search volume data at weekly minimum 
frequency since January 2004. Next we also take into ac- 
count the top recommended relevant search terms by Google 
insights of search, thus expanding the already existing group 
of search terms. 

To further normalize and better understand the computa- 
tional results, we apply dimension reduction technique of 
principle component analysis. We are able to reduce the 
number of variables (uptil 50 for oil) from search domain 
by combining similarly behaving time series to create com- 
pletely uncorrelated co-independent factors- Fact 1 and Fact 
2. Principal component analysis (PGA) is a mathematical 
procedure that uses an orthogonal transformation to convert 
a set of observations of possibly correlated variables into a 
set of values of uncorrelated variables called principal com- 
ponents which reveals underlying structure that is responsi- 
ble for maximum variance. As given in Appendix, Table [7| 
[9l|5l[8]and[6] gives the extracted factors by varimax rotation 
technique to produce orthogonal factors. To identify the 
factors that cause maximum variance in retweets, we have 
used Kaiser criterion in which the factors with eigen values 
greater than 1 are extracted. 

Table 1: Google search Terms for 5 Securities 



US Oil Funds 


oil commodity, crude oil, oil etfs, curde oil 
price, oil futures, oil quotes, oil price per bar- 
rel, oil prices bloomberg, wti crude oil, oil 
prices, how much of oil is left, crude oil ticker 
+ 50 more similar terms etc. 


DJIA 


djia, dow jones industrial average, dow jones, 
dow, s&p 500, Stock Market, stock message 
board 


Nasdaq- 100 


nasdaq up, djia today, dow futures quote, fu- 
tures quote, djia quote, nylc, bank of america 
dividends 


Gold 


buy gold, invest in gold US data, invest in gold 
worldwide, dollar to pound exchange rate, dol- 
lar to pound exchange 


Euro 


exchange rates converter, dollar euro exchange 
rate historyrupee exchange rate, oanda cur- 
rencyrupee exchange, dollar rupee exchange 
rate, bloomberg live tv,eurusd 



2.3 Financial Market Data 

We have done analysis in five different sectors oil, DJIA , 
NASDAQ-100, gold and Euro. Most of the data, including 
all the VIX indices and Euro to USD fedex rates used for 
analysis are collected from econometrics data from Federal 
Reserve Bank of St. Louis |^ Gold prices time series are 
downloaded from World Gold Council Weekly time series 
for US oil funds and weekly index movements in DJIA and 
NASDAQ-100 are extracted from Yahoo Finance! APF] 



Data: 



^http: / /www. google. com/insights/search/ 

^Federal Reserve Economic 

http: / / research. stlouisfed.0rg/fred2/l 

""http: / / www.gold.org/investment / statistics / goldpricechart / 
^http:/ /finance. yahoo, com/ 



The financial features (parameters) available from Yahoo fi- 
nance under study are opening (Ot) and closing (Ct) value of 
the stock/index, highest (i^t), lowest (Lt) value and volume 
traded for the stock/index. In addition returns are defined 
as difference between the logarithm of closing values of the 
stock index between the week's Friday and previous week's 
Friday. 

= {InClose^t) - \nClose^t-i)} x 100 (3) 

Trading volume is the logarithm of number of traded shares 
every week. We estimate weekly volatility based on intra- 
day highs and lows using Garman and Klass volatility mea- 
sures [g] given by the formula: 

-=y^E^[l"tF-[21"2-l][ln|]^ (4) 

Further in this section we will discuss the various security 
indices in each of the sector under study. 

2.3.1 Oil 

In this study we have taken USO- United States Oil Fund, an 
exchange traded fund (ETF) that is one of the highly traded 
security and strongly tracks movements of light, sweet crude 
oil purchased and sold at NYSE Area. We have extracted 
weekly closing values, volatility and volume parameters from 
the lexicon. In addition to this we have also taken CBOE 
OIL volatility index Q (further referred as VIX) which is 
index measure of market's expectation of 30-day volatility 
of crude oil prices. 

2.3.2 DJIA 

Its an aggregate of 30 highly traded and influential stock 
evenly distributed over all sectors. We have taken weekly re- 
turns, volatility and volume as parameters under the study. 
Further we have also extracted CBOE DJIA VIX which is 
indicative measure of fluctuation in 30-day future index sen- 
sitivities. 

2.3.3 NASDAQ-im 

Its an aggregate of the top 100 stocks from NASDAQ ex- 
change which indexes majority of the technological stocks in 
the market. For this as well we have taken weekly returns, 
volatility and volume as the parameters under study. In ad- 
dition we also extracted CBOE NASDAQ- 100 VIX which is 
indicative measure of 30-day ahead index movements. 

2.3.4 Gold 

We have taken price in US dollar (USD) as its the most 
traded currency for gold in the world to accurately represent 
search volumes in each country and related twitter buzz for 
the precious metal. Further we have extracted Gold ETF 
VIX as well from CBOE, as indicative of a month ahead 
fear-gauge in the price of the precious metal. 

2.3.5 Euro 

We have taken only two parameters- one Euro to USD (US 
dollar) conversion rates at closing of the market on Friday's 
eve for every week and other CBOE Euro ETF VIX as mea- 
sure of 30-day market fear for the same. 

^http:/ /www. cboe.com/micro/oilvix/introduction.aspx 



3. STATISTICAL TECHNIQUES AND RE- 
SULTS 

In this section we begin statistical analysis and forecast- 
ing performance on each of the financial securities as dis- 
cussed in the section [2?3l from two dynamic components in- 
vestor behavior comprising of 10 components from Twitter 
as discussed in section |2.1.2| and 1 or 2 principle factors 
from Google SVI as discussed in section [2^ First we iden- 
tify correlation patterns across various time series at differ- 
ent lagged intervals, further testing the causative relation- 
ships of SVI and tweet features on the market securities us- 
ing econometric technique of Granger's Casuality Analysis. 
Then we make use of expert model mining system (EMMS) 
to propose and test the forecasting model and draw perfor- 
mance based conclusions. 

3.1 Correlation and Cross-Correlation Anal- 
ysis 

We begin our study by identifying pairwise correlation met- 
rics between 10 Twitter features for each security index given 
in sect ion [2. 1.2 1 and the factors derived from SVI search fac- 
tors as given section [2?2] 

3.1.1 Technique 

Once we obtain the pearson correlation coefficients, as an 
evaluation of the lagged response of relationships existing be- 
tween financial features. Twitter sentiments and the search 
volumes; we compute cross-correlation at a lag of ± 7 week 
lag to show confidence and effectiveness in results. It also 
motivates us to look forward in making an accurate forecast- 
ing model by picking accurate regressor co-efficient. 

For any two series x — {xi, ,Xn} and y — {^/i, ^Vn}^ 

the cross correlation lag 7 at lag k is defined as: 



^ ^ T.i{xi+k - x){yi - y) 

In equation |5] x ^ are the mean sample values of x and y 
respectively. Cross-correlation function defined as short for 
ccf(x,y), is estimate of linear correlation between Xt+fc and 
which means keeping the time series y stationary, we 
move the time series y backward to forward in time by a lag 
of k i.e. k= [-7,7] for lags fo 7 weeks in positive and negative 
direction. Cross-correlation gives the measure of anticipated 
values of statistically significant relations in a time series x 
which can be made part of the forecasting models discussed 
ahead. 

3.1.2 Results 

The Figure [2] as heat map along with Figure 3 as radar maps, 
represents summarized set of pearson correlation results for 
financial. Twitter and SVI time series after transformation 
to log scale. Corresponding red and blue sections in both 
the figures correspond to statistically significant relation- 
ships between the dependent variables. 

For Twitter features we examine 5 series- positive, negative, 
bullishness (Bull), message volume (Msg Vol.) and agree- 
ment (Agrmnt) as two cases one as weekday (active market 



trading) and other as weekend (during market off days). We 
realize that the overah nature of relationship exhibit vary- 
ing degree of association. But the clear trend that we ob- 
serve is that market-off days don't carry high weights when 
compared to overall data available on comparison to market 
active days but its still significantly correlated and can be 
potentially exploited while designing the hedging strategies 
as discussed later in this paper. Weekday bullishness is one 
of the important feature out of all others to look out for 
any investment and show uniformly significant behavior in 
all the sectors with value of pearson 'r' as high as —0.73 for 
DJIA's weekly volatility. Another interesting trend that is 
observed is returns in both DJIA and NASDAQ-100 show 
negative relationship of varying strength with both positive 
and negative feeds indicating heavy discussion which is more 
sensitive to message volume on Twitter before fall in the 
index. But significantly valuable relationship of 0.593 cor- 
relation exists with the returns with our introduced feature 
term bullishness which is relative measure of positive to neg- 
ative sentiment of investor community as explained in earlier 
sect ion [2. 1.2 1 NASDAQ doesn't carry any relationship with 
weekends Twitter discussions on account fast and disper- 
sive behavior of news memes among tech-savvy investors of 
technological stocks whom are expected to be faster response 
to news. For volatility indices (VIX) for various securities 
shows significant negative relation with weekday agreement 
index which is vector distance between positive and negative 
discussion about any security as measure of accurate pic- 
ture of about to happen turbulence/ perceived market risk 
in the coming weeks; except for DJIA which consists major 
30 stocks only which are subjected to highly balanced con- 
sistent movements due to heavy trading activity across any 
time domain. 

We find stronger correlation of the principle factors from 
SVI series uptil 0.826 for commodity funds like for oil, gold 
and Euro forex rates as compared to index movements of 
DJIA and NASDAQ-100; giving an impression that people 
tend to search more for commodity funds then stock equi- 
ties indicating a better understanding of control heuristics 
of actual market movements from investor behavior. From 
Figure [2] we can see that the VIX is one of the highly corre- 
lated financial feature in all the 5 cases, thus maybe referred 
as a strong measure of investor behavior though computa- 
tional gauge of investor fear. Another important significant 
relation that we observed for NASDAQ-100 and DJIA is the 
negative correlation with returns in contrast to positive cor- 
relation of volatility, volume and VIX; which is indicative 
of high search behavior being caused by fall in the index 
values, increasing more volume in trading making the index 
movements more volatile. 



As we can see in the figure 4 |(a)| highest correlation is ex- 
hibited by oil VIX and SVI which is roughly balanced on 
both the sides indicating a bi-causative relation in both the 
directions. Similar observations can be seen for oil fund and 
oil VIX to Twitter message volume. Tweet message volume 
have stagnant low slope on the negative lag side which indi- 
cates surge in oil related discussion on Twitter consistently 
prior to actual hike in the price. For DJIA and NASDAQ- 
100 as observed from figure 4 |(b)| and (c) much balanced 
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Figure 2: Heatmap showing pearson correlation co- 
efficients between security indices vs features from 
Twitter and SVI factors. (Blue and red correspond 
to significant correlation values in Figure 3) 



the negative lag side is observed by volatility in the index 
for k=-l, indicating a fall of -0.8 correlation in tweet based 
bullishness atleast a week before the actual market trading. 
Similar effect is observed for search volumes uptil 4-5 weeks 
before the actual trading volume increases. NASDAQ-lOO's 
correlation activity doesn't give much insights into relation- 
ships between the features which maybe due to non- lin- 
ear associations or significant relations hidden at smaller 
time domains frequencies as nature of tech-savvy investors 
of technological stocks. But we can see that bend on posi- 
tive k lag side for volatility with search volumes for a week 
before and constant increase in bullishness prior 2 weeks 
before actual surge in volatility. However we leave this area 
for future exploration. 



correlation factors can be observed for majority of the pairs 
in both the cases. However, for DJIA significant bend on 



From figure 4 (d) and (e) we can see balanced correlation 
for gold prices and Euro conversion rates. However impor- 
tant conclusions comes when we see behavior of Gold ETF's 
VIX, which is negative correlation prior one to two weeks; 
indicating increase gold related tweet discussions before dip 
in VIX index occurs. But it shows negative correlation at 
positive lag with search volumes. In contrast we observe a 
dip in VIX index (fear of buying gold) caused by increased 
discussion on Twitter as investors consider it as a safe in- 
vestment, hence the confounding effect further observed in 
the search volumes. 

3.2 Granger Causality Analysis 

GCA rests on the assumption that if a variable X causes Y 
then changes in X will be systematically occur before the 
changes in Y. We realize lagged values of X shall bear sig- 
nificant correlation with Y. However correlation is not nec- 
essarily behind causation. Like the earlier approaches by 
[1] [jl we have made use of GCA to investigate whether one 
time series is significant in predicting another time series. 
GCA is used not to establish statistical causality, but as an 
economist tool to investigate a statistical pattern of lagged 
correlation. A similar observation that smoking causes lung 
cancer is widely accepted; proving it contains carcinogens 
but itself may not be actual causative of the real event i.e. 
cancer in this case. 

3.2.1 Technique 

Let returns Rt be reflective of fast movements in the stock 
market. To verify the change in returns with the change in 
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Figure 3: Radar maps showing pearson correlations 
of Twitter and SVI features vs commodities like oil 
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Figure 4: Cross Correlation of T witter an d SV I fea- 
tures vs commodities like oil |(a)| and gold [(d^l stock 
indices like DJIA [(b)] and NASDAQ-100 ^c)] and 
forex rate of Euro |(e)| 



Twitter features we compare the variance given by following 
linear models in equation [6] and equation [7| 

= a + + et (6) 



Equation |6] uses only 'n' lagged values of Rt , i.e. {Rt-i, . 
. .yRt-n ) lor prediction, while equation |7| uses the n lagged 
values of both Rt and the tweet features time series given 
by Xt-i^ ...,Xt-n- We have taken weekly time window to 
validate the casuality performance, hence the lag values 
will be calculated over the weekly intervals 1, 2, 7. 

3.2.2 Results 

From the Table [2] we can reject the null hypothesis (Ho) 
that the SVI and Twitter investor behavior do not affect 
returns in the financial markets i.e. /3i,2,....,n / with a high 
level of confidence (high p- values). However as we see the 
result applies to only specific negative and positive tweets 
(** for p- value < 0.05 and * for p- value < 0.1 which is 95% 
and 99% confidence interval respectively). Other features 
like agreement and message volume do not have significant 
casual relationship with the returns of a stock index (low 
p- values) . 

In Table |2] we can see that at the lag of one week, almost all 
the features are significant in predicting changes in the fi- 
nancial features of oil, DJIA, NASDAQ-100, gold and Euro. 
However as we go in the positive lag direction from 1st to 4 
weeks, the significance decreases showing Twitter and SVI 
mood series as Granger's causative of financial features. SVI 
shows uniform p values i.e. confidence of uptil 99% for al- 
most all the sectors- both index (DJIA, NASDAQ-100) and 
commodities (gold, oil and forex rate of Euro). Twitter fea- 
tures specially for the indices- DJIA and NASDAQ-100 don't 
significance beyond 2-3 weeks, indicating the dispersive na- 
ture of information entropy on the social networks in con- 
trast to the SVI factors. 

3.3 EMMS model for Forecasting Analysis of 
Financial features 

In this section we work upon the perennial question of how 
much? and how good? are these features proposed in the 
earlier sections can be useful to make accurate forecasts of 
financial indicators. For the same purpose we have used 
Expert Model Mining System (EMMS) which incorporates 
a set of competing methods such as Exponential Smoothing 
(ES), Auto Regressive Integrated Moving Average (ARIMA) 
and seasonal ARIMA models. These methods are widely 
used in financial modeling to predict the values of stocks/ 
bonds/ commodities/etc [13[ [5]. These methods are suit- 
able for constant level, additive trend or multiplicative trend 
and with either no seasonality, additive seasonality, or mul- 
tiplicative seasonality. 

3.3.1 Technique 

Selection criterion for the EMMS is MAPE and stationary 
R squared which is measure of how good is the model under 
consideration as comapred to the baseline model 17 . The 
stationary R-squared can be negative with range (— oo, 1]. A 

^lag at k for any parameter M at xt week is the value of 
the parameter prior to Xt-k week. For example, value of 
returns for the month of April, at the lag of one month will 
be returuaprii-i which will be returumarch 



Table 2: Granger's Casuality Analysis- statistical 
significance (p values) at lags of 1,2,3 and 4 weeks 
between financial indicators and features of investor 
behavior (p - value < 0.01:t, p - value < 0.05:1, p - 
value < 0.1:*) 



Securities 


Lag 














1 


2 


3 


4 



Oil 



DJIA 



Nasdc 



Gold 



Euro 



Close 



VIX 



Retu] 



VIX 



Retu] 



VIX 



Price 



VIX 



EURU; 



VIX 



Positive 
Negative 
Bull 
Msg Vol 
Agreement 

SVI 

Positive 

Negative 

Bull 

Msg Vol 

Agreement 

SVI 



Positive 
Negative 
Bull 

sg Vol 
Agreement 

SVI 

Positive 

Negative 

Bull 

Msg Vol 

Agreement 

SVI 



Positive 
Negative 
Bull 

sg Vol 
Agreement 

SVI 

Positive 

Negative 

Bull 

Msg Vol 

Agreement 

SVI 



Positive 
Negative 



sg Vol 
Agreement 

SVI 

Positive 

Negative 

Bull| 

Msg Vol 

Agreement 

SVI 



Positive 
Negative 
ull 
sg 
Vol 

Agreement 
SVI 



Positive 

Negative 

Bull 

Msg 

Vol 

Agreement 
SVI 



.009t 
.014t 
.014t 
.0181 
0.061* 
0.038t 
0.048t 
0.6 

0.032* 
0.078* 
0.008t 



,1* 
352 
25 
,05t 

521 
201 



0.755 

0.666 

0.77 

0.911 

0.89 

O.OOGt 



0.238 
0.204 
0.238 
0.397 
0.421 

o.ooij: 



.966 
.683 
819 
701 
804 

0.00002^0.001$ 



0.454 
0.303 
0.364 
0.706 
0.411 

o.oart 



0.746 
0.621 
0.742 
0.949 
0.957 
0.07* 



0.675 

0.065* 

0.38 

0.052* 

0.264 

0.021t 



601 

.056* 

442 

608 

243 

.053* 



0.986 
0.996 
0.991 
0.947 
0.826 
0.021t 



0.266 
0.331 
0.305 
0.237 
0.552 
0.057* 



0.461 

0.024* 

0.38 

0.033* 

0.427 

0.015t 



501 
286 
527 
05* 
436 
,06* 



0.936 

0.91 

0.672 

0.666 

0.616 

0.017t 



0.683 

0.388 

0.583 

0.97 

0.752 

0.03t 



0.088* 

0.737 

0.061* 

0.253 

0.091* 

0.091* 

0.076* 

0.001$ 

0.043t 

0.179 

0.019t 

0.0002$ 



017t 
,017t 
024t 

218 
31 

081* 



086* 

31 

241 

427 

229 

,002$ 



0.049t 
0.076* 

0.213 

0.043 

0.988 

0.064* 

0.025t 

0.893 

0.021t 

0.024t 

0.278 

0.02t 



0.1* 
0.064* 

0.136 

0.473 

0.245 

0.091* 

0.042t 

0.128 

0.04t 

0.148 

0.093* 

0.054* 



0.136 

0.56 

0.004$ 

0.027t 

0.035t 

0.0001$ 

0.083* 

0.1* 

0.385 

0.793 

0.1* 

0.414 



331 

712 

,023t 

,1* 

,009$ 

,00034$ 



0.631 

0.807 

0.028t 

0.625 

0.015t 

0.00041$ 



.11* 

454 
641 
.1* 
385 
.059* 



0.192 
0.1* 
0.509 
0.305 
0.493 
0.057* 



0.41 

0.66 

0.058* 

0.557 

0.045t 

0.001$ 

0.05t 

0.033t 

0.755 

0.256 

0.184 

0.05t 



0.051* 0.11* 

0.043* 0.51 

0.069* 0.754 

0.1* 0.439 



0.1* 

0.249 
0.521 
0.1* 



0.336 
0.561 
0.497 
0.157 



0.944 0.985 0.62 0.399 

0.00001$ 0.00006$ 0.00008$ 0.0001 



OTP^ 0.085* 

0.028t O.Ollt 

0.498 0.1* 

0.443 0.256 



0.384 
0.091* 



0.587 
0.0001$ 



0.092* 
0.034t 
0.1* 

0.987 

0.55 
0.002$ 



0.431 
0.068 
0.797 
0.213 

0.557 
0.003$ 



negative R-squared value means that the model under con- 
sideration is worse than the baseline model. Zero R-squared 
means that the model under consideration is as good or bad 
as the baseline model. Positive R-squared means that the 
model under consideration is better than the baseline model. 
Mean absolute percentage error (MAPE) is mean residuals 
(difference between fit value and observed value in percent- 
age). To show the performance of tweet features in predic- 
tion model, we have applied the EMMS twice - first with 
SVI and Twitter sentiment features as independent predic- 
tor events and second time without them. This provides 
us with a quantitative comparison of improvement in the 
prediction using tweet features. 

ARIMA (p,d,q) in theory and practice, are the most general 
class of models for forecasting a time series data, which is 
subsequently stationarized by series of transformation such 
as differencing or logging of the series Yi . For a non-seasonal 
ARIMA (p,d,q) model- p is autoregressive term, d is number 
of non-seasonal differences and q is the number of lagged 
forecast errors in the predictive equation. A stationary time 
series Ay differences d times has stochastic component: 

Ay^ ^ Normal{iii, a^) (8) 

Where fii and are the mean and variance of normal distri- 
bution, respectively. The systematic component is modeled 
as: 

fii = aiAYi-i + + apAYi-p + OiSi-i + 

-\-6i£i-q 

Where, Ay the lag-p observations from the stationary time 
series with associated parameter vector a and Ci the lagged 
errors of order q, with associated parameter vector. The 
expected value is the mean of simulations from the stochastic 
component, 

E(Y(^i) ^ fii^ aiAYi-i + + apAYi-p + OiSi-i 



Table 3: Forecasting results for the financial securi- 
ties 



(10) 



Seasonal ARIMA model is of form ARIMA (p ,d ,q) (P,D,Q) 
where P specifies the seasonal autoregressive order, D is 
the seasonal differencing order and Q is the moving aver- 
age order. Another advantage of EMMS model is that it 
is a stepwise forecasting process which automatically selects 
the most significant predictors among all other Twitter sen- 
timent series and SVI features. 

3.3.2 Results 

Model equation for two cases are given below as equation |11| 
for forecasting without predictors and equation [12] for fore- 
casting with predictors. In these equations Y is the financial 
feature- oil, gold, DJIA etc. and X represents the investor 
mood series from the SVI and Twitter features. 

Without Predictors : Yt = a -\- Ti^ i^i^iYt^i + tt (11) 

WithPredictors :Yt^a + S",=iftyt=, + S",=i7,Xt=, + a 

(12) 

In the dataset we have time series for the total of 66 weeks, 
out which we use approximately 76% i.e. 50 weeks for the 

and [TTJ for 



12 



training both the models given in equation 
the time period 2nd June 2010 to 27th May 2011). Further 
we verify the model performance as one step ahead forecast 
over the testing period of 16 weeks from May 30th to 13 
September 2011 which count for wide and robust range of 



Market Securities 



Predictors I MAPE I Direction 



US Oil Funds 


Index 


Yes 
No 


2.3202 
2.4203 


75 
62.5 


VIX 


Yes 


4.5592 


75 




No 


5.1218 


56.3 




Index 


Yes 


0.8557 


94.3 


DJIA 


No 


1.1698 


60 


VIX 


Yes 


5.3017 


82.9 




No 


5.6943 


62.9 


NASDAQ-100 




Yes 
No 


1.3235 
1.3585 


90 
50 


VIX 


Yes 


3.2415 


83.3 




No 


5.7268 


50 




USD 


Yes 


1.5245 


78.6 


Gold 


No 


1.5555 


64.3 


VIX 


Yes 


0.2534 


71.9 




No 


5.2724 


56.1 


Euro 


EURUSD 


Yes 
No 


2.6224 
4.3541 


74.1 
58.6 


VIX 


Yes 


4.4124 


69 




No 


4.7878 


53.4 



market conditions. Forecasting accuracy in the testing pe- 
riod is compared for both the models in each case in terms of 
mean absolute percentage error (MAPE) and the direction 
accuracy. MAPE is given by the equation [13] where yi is 
the predicted value and yi is the actual value. 

MAPE = — — — X 100 (13) 

n 

While direction accuracy is measure of how accurately mar- 
ket or commodity up/ d.own movement is predicted by the 
model, which is technically defined as logical values for (2/i,t+i" 
yi,t) X (2/2, t+i - yi,t) > respectively. This is of prime im- 
portance to the high frequency traders and investors who 
hedge their investment in derivative markets as lots of prices 
(option premium, bonds etc.) are solely determined by the 
direction of the moving index or price. 

As given in Table [3] we observe that the there is significant 
reduction in the values of MAPE for all the sectors for the 
forecasting model with the use of predictor sentiment and 
SVI series than the predictor model without the use of the 
these predictor series. Also for index values of DJIA direc- 
tion accuracy of uptil 94.3% is achieved, while it is for 90% 
for NASDAQ-100. SVI and measure of wisdom of crowd on 
Twitter gives quite a robust picture of how changing dy- 
namics of the public opinion can be reflective of the market 
movements that would happen in near future. 

4. DISCUSSIONS 

From Table |4] we can see that earlier works in the area of 
behavioral finance were limited to profile of mood states and 
dimensions of public mood in context of investing. Primary 
objective of this work is to bring out a uniform model com- 
bining search volume behavior along with how people are 
speaking and about what? on Twitter and observe how se- 
vere or accurate these effects get over the increasing time lag. 
Use of bullishness, agreement and message volume although 
non-linearly dependent on each other, provides additional 
features to measure sentiment in a subjective manner and 
also provides better understanding of variable importance. 
As seen in Table [3] we observe one of the most significant 



Table 4: Comparison with prior work in sentiment 
analysis for predicting markets 



Previous 
Ap- 
proaches 


Bollen et al. ^ 
|12| and Gilbert 
et al. T 


Sprenger et 
al. p8| 


Our Approach 


Approach 


Mood of com- 
plete Twitter 
feed 


Stock Dis- 
cussion with 
ticker $ on 
Twitter 


Combining Twit- 
ter sentiment -|- 
Google search 
volumes 


Dataset 


28th Feb 2008 to 
19th Dec 2008, 
9M tweets sam- 
pled as 1.5% of 
Twitter feed 


1st Jan 
2010 to 
30th June 
2010- 0.24M 

tweets 


2nd June 2010 to 
13th Sept 2011- 
1.9M tweets 
through search 
API 


Techniqu 


3SSOFNN, 
Grangers and 
linear models 


OLS Re- 
gression and 
Correlation 


Cross- Corr, 
CCA, Expert 
Model Mining 
System (EMMS) 


Results 


86.7% direc- 
tional accuracy 
for DJIA 


Corr values 
uptil 0.41 
for S&P 100 
stocks 


Corr uptil 0.82 
for OIL, DJIA, 
NASDAQ-100, 
Gold and Euro. 
Directional ac- 
curacy uptil 
94% 


Feedback 
Draw- 
backs 


/ Individual mod- 
eling for stocks 
not feasible 


News not 
taken into 
account, 
very less 
tweet vol- 
umes 


Comprehensive 
and customizable 
approach 



improvements for NASDAQ-lOO's VIX (MAPE- 3.2415) and 
Gold VIX (MAPE- 0.2534), indicating the tech savvy in- 
vestors who tweet a lot, hold significant power for the index 
movements. Comparing general prediction performance of 
behavior features (SVI + Twitter sentiment series) for mar- 
ket indices and commodity prices over the VIX index; better 
performance can be observed for VIX accounting for the fact 
that these behavior features are better indicative of investor 
fear before the actual price movement occurs in the stock. 
However for the forex price of Euro, investor sentiment is 
more centralized factor in controlling the price movement as 
compared to the VIX index. Modeling market sentiment is 
luring area that investors are looking forward for use in hedg- 
ing the investment instruments. Our results show, there is 
no clear uniform pattern observed in the relationships across 
various elements of investor behavior and stocks over a wide 
spectrum of market securities and indices. Hence general 
conclusions regarding complexities about the performance 
cannot be drawn. We actively look forward into the fu- 
ture when fully automated bots will be advanced enough to 
understand all the behavioral mood dimensions of location 
specific discussions of what an investor is saying and make 
successful investments at minute time-scales. There are al- 
ways limitation to all predictive approaches. As statistician 
Box rightly said- 'all models are wrong, but some are useful'. 
With the limitations of natural language processing tech- 
niques, there exists multitude of problems associated with 
higher mood states and learning extensions to other algo- 
rithms 8 . We also observe for some cases, there is fall in 
the returns due to excessive rise in the bullish (positive) sen- 
timent for a commodities/ indices. However some variation 
exists for product companies like EBay, Dell etc. due to ex- 
cessive discussion related to product offerings and promoted 
messages instead of general discussion by people. 



5. CONCLUSION AND FUTURE WORK 

Proposed approach combines the advantage of sophisticated 
statistical and linguistics summarization techniques. Such 
methods are able to capture a good picture of both the 
changing rates along with rise and fall probabilities for both 
commodities and stocks. Performance of the proposed model 
is more accurate in comparison to earlier works which were 
restricted only to mood states of entire Twitter feed applied 
in general to the market index [l] |7[ [iS]. We have made 
validation against larger tweet base, over bigger time pe- 
riod, with larger number of financial market instruments and 
greater prediction accuracy than any of the earlier works. 
Moreover, as far as practical implementation is concerned, 
our approach not only helps to improve index movements 
but also the present volatility and the VIX index which is 
the measure of the 30-day s ahead market fear. More impor- 
tantly it can be also used to determine portfolio adjustment 
decisions like ratio of risk to security for hedging with the 
greater confidence. 
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7. APPENDIX 

The PGA component matrixes for Oil, DJIA, NASDAQ- 
100, Gold and Euro are given in Tables (ej [T] [S] |5] and [5] 

respectively. Feature reduction is an important step before 
development of any model so as to increase predictive accu- 
racy, simplicity and comprehensibility of the mined results. 
Effect of so many search terms can be concisely mapped to 
double or single factors i.e. original high-dimensional data 
onto a lower dimensional space. The new PGA factors are 
uncorrelated, and are ordered by the fraction of the total in- 
formation each retains and filtered out on the basis of Kaiser 
criterion, with threshold eigen value greater than 1. Each 
of the search term factors (Fact 1 and Fact 2) explain sig- 
nificant amount of variance as in the original feature set of 
search keywords given in the Tables below. 



Table 5: Vector Matrix for Gold SVI Factors {Only 
1 factor for gold as search terms for gold fall on the 
same dimension plane in the feature vector map.) 



Table 6: Vector Matrix for Oil SVI Factors 





Gold SVI factors 


Search Term 


Fact 1 


buy gold 


.575 


invest in gold US data 


.942 


invest in gold worldwide 


.884 


dollar to pound exchange rate 


.905 


dollar to pound exchange 


.904 



Search Terms 


Oil SVI Factors 
Fact 1 Fact 2 


oil commodity 


.U I 1 


.uoo 


crude oil etf 


'^l Q 




oil funds 




Qfin 
.yuu 


oil etf 


.oUo 




oil quotes 


.593 


.599 


oil prices per barrel 


.853 


.338 


spot oil prices 


.727 


.533 


wti crude 


.361 


.504 


how much oil is left 


.452 


.446 


futures price 


.521 


.759 


how to buy oil 


.298 


.918 


oil ticker 


.634 


.655 


current oil 


.741 


.021 


crude oil futures 


.441 


.639 


crude oil price 


.442 


.700 



Table 7: Vector Matrix for DJIA SVI Factors 



Search Terms 


Dow SVI Factors 
Fact 1 Fact 2 


djia 


.931 


.038 


dow jones industrial average 


.811 


.495 


dow jones 


.929 


.116 


dow 


.966 


.010 


s&p 500 


.635 


-.032 


Stock Market 


.689 


-.337 


stock message board 


-.012 


.936 



Table 8: Vector Matrix for Nasdaq SVI Factors 



Search Terms 


Nasdaq SVI Factors 
Fact 1 Fact 2 


nasdaq today 


.833 


.398 


dow futures quote 


.888 


.239 


futures quote 


.673 


-.004 


NASDAQ quote 


.821 


.326 


nylc 


.257 


.936 


bank of america dividends 


.161 


.947 



Table 9: Vector Matrix for Euro SVI Factors 



Search Terms 


Euro SVI factors 
Fact 1 Fact 2 


exchange rates converter 


.530 


-.383 


dollar euro exchange rate history 


.798 


.079 


rupee exchange rate 


-.047 


.063 


oanda currency 


.105 


.810 


rupee exchange 


.929 


-.102 


dollar rupee exchange rate 


.938 


.043 


bloomberg live tv 


.828 


-.228 


eurusd 


-.237 


.741 



